106 research outputs found

    Mobile AR Depth Estimation: Challenges & Prospects -- Extended Version

    Full text link
    Metric depth estimation plays an important role in mobile augmented reality (AR). With accurate metric depth, we can achieve more realistic user interactions such as object placement and occlusion detection. While specialized hardware like LiDAR demonstrates its promise, its restricted availability, i.e., only on selected high-end mobile devices, and performance limitations such as range and sensitivity to the environment, make it less ideal. Monocular depth estimation, on the other hand, relies solely on mobile cameras, which are ubiquitous, making it a promising alternative for mobile AR. In this paper, we investigate the challenges and opportunities of achieving accurate metric depth estimation in mobile AR. We tested four different state-of-the-art monocular depth estimation models on a newly introduced dataset (ARKitScenes) and identified three types of challenges: hard-ware, data, and model related challenges. Furthermore, our research provides promising future directions to explore and solve those challenges. These directions include (i) using more hardware-related information from the mobile device's camera and other available sensors, (ii) capturing high-quality data to reflect real-world AR scenarios, and (iii) designing a model architecture to utilize the new information

    Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

    Get PDF
    Automatic emotion recognition from speech, which is an important and challenging task in the field of affective computing, heavily relies on the effectiveness of the speech features for classification. Previous approaches to emotion recognition have mostly focused on the extraction of carefully hand-crafted features. How to model spatio-temporal dynamics for speech emotion recognition effectively is still under active investigation. In this paper, we propose a method to tackle the problem of emotional relevant feature extraction from speech by leveraging Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks with fully convolutional networks in order to automatically learn the best spatio-temporal representations of speech signals. The learned high-level features are then fed into a deep neural network (DNN) to predict the final emotion. The experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD) and the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpora show that our method provides more accurate predictions compared with other existing emotion recognition algorithms

    Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

    Full text link
    Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: https://latent-consistency-models.github.io

    Melodic Phrase Segmentation By Deep Neural Networks

    Full text link
    Automated melodic phrase detection and segmentation is a classical task in content-based music information retrieval and also the key towards automated music structure analysis. However, traditional methods still cannot satisfy practical requirements. In this paper, we explore and adapt various neural network architectures to see if they can be generalized to work with the symbolic representation of music and produce satisfactory melodic phrase segmentation. The main issue of applying deep-learning methods to phrase detection is the sparse labeling problem of training sets. We proposed two tailored label engineering with corresponding training techniques for different neural networks in order to make decisions at a sequential level. Experiment results show that the CNN-CRF architecture performs the best, being able to offer finer segmentation and faster to train, while CNN, Bi-LSTM-CNN and Bi-LSTM-CRF are acceptable alternatives

    An edge cloud and Fibonacci-Diffie-Hellman encryption scheme for secure printer data transmission

    Get PDF
    Network printers face increasing security threats from network attacks that can lead to sensitive information leakage and data tampering. To address these risks, we propose a novel Fibonacci-Diffie-Hellman (FIB-DH) encryption scheme using edge cloud collaboration. Our approach utilizes properties of third-order Fibonacci matrices combined with the Diffie-Hellman key exchange to encrypt printer data transmissions. The encrypted data is transmitted via edge cloud servers and verified by the receiver using inverse Fibonacci transforms. Our experiments demonstrate that the FIB-DH scheme can effectively improve printer data transmission security against common attacks compared to conventional methods. The results show reduced vulnerabilities to leakage and tampering attacks in our approach. This work provides an innovative application of cryptographic techniques to strengthen security for network printer communications

    Association of Geriatric Nutritional Risk Index with Mortality in Hemodialysis Patients: A Meta-Analysis of Cohort Studies

    Get PDF
    Background/Aims: Geriatric nutritional risk index (GNRI) was developed as a “nutrition-related” risk index and was reported in different populations as associated with the risk of all-cause and cardiovascular morbidity and mortality. Therefore, GNRI can be used to classify patients according to a risk of complications in relation to conditions associated with protein-energy wasting (PEW). However, not all reports pointed to the prognostic ability of the GNRI. The purpose of this study was to assess the associations of GNRI with mortality in chronic hemodialysis patients. Methods: We electronically searched original articles published in peer-reviewed journals from their inception to September 2018 in The PubMed, Embase, and the Cochrane Library databases. The primary outcome was all-cause and cardiovascular mortality. We pooled unadjusted and adjusted odds ratios (ORs) with 95% confidence intervals (95% CIs) using Review Manager 5.3 software. Results: A total of 10,739 patients from 19 cohort studies published from 2010 to 2018 were included. A significant negative association was found between the GNRI and all-cause mortality in patients with chronic hemodialysis (OR, 0.90; 95% CI, 0.84-0.97, p=0.004) (per unit increase) and (OR, 2.15; 95% CI, 1.88-2.46, p<0.00001) (low vs. high GNRI). Moreover, there was also a significant negative association between the GNRI (per unit increase) and cardiovascular events (OR, 0.98; 95% CI, 0.97-1.00, p=0.01), as well as cardiovascular mortality (OR, 0.89; 95% CI, 0.80-0.99, p=0.03). Conclusion: Our findings supported the hypothesis that the low GNRI is associated with an increased risk of all-cause and cardiovascular mortality in chronic hemodialysis patients. Based on our literature review, GNRI has been found to be an effective tool for identifying patients with nutrition-related risk of all-cause and cardiovascular disease
    corecore